BE: A Search Engine For NLP Research

نویسندگان

  • Michael J. Cafarella
  • Oren Etzioni
چکیده

Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries—they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability. In response, we have designed the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions (Cafarella and Etzioni, 2005). For example, in response to the query “powerful 〈noun〉” BE will return all the nouns in its index that immediately follow the word “powerful”, sorted by frequency. (Figure 1 shows several possible BE queries.) In response to the query “Cities such as ProperNoun(Head(〈NounPhrase〉))”, BE will return a list of proper nouns likely to be city names.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NLP for Search

This chapter presents an account of key NLP issues in search, sketches current solutions, and then outlines in detail an approach for deep-meaning representation, ontological semantic technology (OST), for a specific, complex NLP application: a meaning-based search engine. The aim is to provide a general overview on NLP and search, ignoring non-NLP issues and solutions, and to show how OST, as ...

متن کامل

A Study of Using Search Engine Page Hits as a Proxy for n-gram Frequencies

The idea of using the Web as a corpus for linguistic research is getting increasingly popular. Most often this means using Web search engine page hit counts as estimates for n-gram frequencies. While the results so far have been very encouraging, some researchers worry about what appears to be the instability of these estimates. Using a particular NLP task, we compare the variability in the n-g...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

First Classified Annotated Bibliography of NLP Tasks in the Burmese Language of Myanmar

Natural Language Processing (NLP) has emerged with a wide scope of research in the area. The Burmese language, also called the Myanmar Language is a resource scarce, tonal, analytical, syllable-timed and principally monosyllabic language with Subject-Object-Verb (SOV) ordering. NLP of Burmese language is also challenged by the fact that it has no white spaces and word boundaries. Keeping these ...

متن کامل

Integration of Agile Ontology Mapping towards NLP Search in I-SOAS

In this research paper we address the importance of Product Data Management (PDM) with respect to its contributions in industry. Moreover we also present some currently available major challenges to PDM communities and targeting some of these challenges we present an approach i.e. I-SOAS, and briefly discuss how this approach can be helpful in solving the PDM community’s faced problems. Further...

متن کامل

On the Instability of Using Search Engine Page Hits as a Proxy for n-gram Frequencies

The idea of using the Web as a corpus for linguistic research is getting increasingly popular. Most often this means using page hit counts as an estimate for n-gram frequencies. While the results so far have been very encouraging, there are also some problems, the most important of which is the instability of these estimates. Using a particular NLP task, we find substantial variability in the n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006